Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"References: "-Headerline

18 views
Skip to first unread message

Konrad Wilhelm

unread,
Apr 26, 1997, 3:00:00 AM4/26/97
to

Is there any possibility in (Free) Agent to get the
"References: "-line from articles where only the header has been
loaded?
I would like to load only those articles, which are answers to the
articles wirtten by me, i. e. have the message-ID of my articles in
their "References: "-Headerline.
k.
--
Konrad Wilhelm <wil...@uni-muenster.de>
Suedstr. 3, D48329 Havixbeck

pls.se...@my.sig

unread,
Apr 26, 1997, 3:00:00 AM4/26/97
to

;On Sat, 26 Apr 1997 21:14:43 GMT, p_ha...@ibm.net.DELETE_THIS (Paul Hantom) wrote:

>.On Sat, 26 Apr 1997 14:16:16 GMT, Konrad Wilhelm posted <3361bd8...@news.uni-muenster.de>:

>> Is there any possibility in (Free) Agent to get the
>> "References: "-line from articles where only the header has been
>> loaded?

>It is in the *.dat file. Happy hunting!

A "References:" header isn't in the .dat file unless the body has been
downloaded.
--
Jim [mailto:JLBradley#worldnet.att.net]
Checked [Free] Agent help and need more? http://sd.znet.com/~lance/
Want an answer fast? http://www.dejanews.com/forms/dnsetfilter.html
Internet beginner? http://www.which.net/nonsub/sitemap/using.html
More help with binaries? http://shell.ihug.co.nz/~ijh/index2.html

pls.se...@my.sig

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

;On Sun, 27 Apr 1997 03:28:40 GMT, p_ha...@ibm.net.DELETE_THIS (Paul Hantom) wrote:

>.On Sat, 26 Apr 1997 22:29:41 GMT, pls.se...@my.sig posted <336381a9...@netnews.worldnet.att.net>:

>> ;On Sat, 26 Apr 1997 21:14:43 GMT, p_ha...@ibm.net.DELETE_THIS (Paul Hantom) wrote:

>> >.On Sat, 26 Apr 1997 14:16:16 GMT, Konrad Wilhelm posted <3361bd8...@news.uni-muenster.de>:

>> >> Is there any possibility in (Free) Agent to get the "References: "
>> >> -line from articles where only the header has been loaded?

>> >It is in the *.dat file. Happy hunting!

>> A "References:" header isn't in the .dat file unless the body has been
>> downloaded.

>Jim, Jim, Jim... then explain to me how Agent manages to thread headers
>without bodies!?
Paul, Paul, Paul...that's a different question.

As you can see for yourself by examining a .dat file, the "References"
header is not present for any message unless the body was downloaded.

Sherlog

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

On Sat, 26 Apr 1997 in <3361bd8...@news.uni-muenster.de>
Konrad Wilhelm typed thusly:

>Is there any possibility in (Free) Agent to get the
>"References: "-line from articles where only the header has been
>loaded?

>I would like to load only those articles, which are answers to the
>articles wirtten by me, i. e. have the message-ID of my articles in
>their "References: "-Headerline.

If you don't mind a bit of programming - yes, there is a possibility.

Agent stores the *hash* of the referenced message ids. The hash of the
immediate predecessor's message id can be found in the .IDX entry at
offset 10h; hashes of the other ids in the "References:" line are in
the "tabbed" string (aka Agent header) in the .DAT.

The "tabbed" string is the only info Agent stores in the .DAT for
messages without bodies and appears to have five fields separated by
tabs; it is terminated with a CR/LF. The fields are:
| <subject> as displayed - tabs etc. replaced with space!
| <author>
| <message-id>
| <references> <-- !!!
| <more> "0" except for multi-parts
For multi-parts things are a bit trickier, but the references field
shouldn't be affected. If the references field is not empty it may look
like this (without the pipe character, that's for coloring):
|# 2 aba356 ff460c48
IOW: a garden fence, a count, and <count> hashes in hex without leading
zeroes, separated by spaces. It appears that Agent does not always
store this info - probably only if the immediate predecessor (whose
hash is in the .IDX entry) is not present.

BTW, the first four fields (subject..) seem to have a limit of 1000
characters each, although Agent cuts subjects at 500 chars in replies.

A utility that does what you envision would scan the .IDX for unread
(and 'unignored') headers that are direct replies to your posts and
mark them for download. A more sophisticated one (that gets follow-ups
a bit farther down also) would have to look into the .DAT as well.
In any case a three-stage operation is required:
- get headers
- run utility (making sure the group is not selected in Agent!)
- get marked message bodies

For more info on the layout of Agent's database files have a look at
the source of Jonzonk's AGloBS:
ftp://ftp.netcom.com/pub/jo/jonzonk/agent/aglobs
or grep this group for "JEHAD" (via AGloBS/DejaNews).

*BUT*, why don't you just set a watch filter on your name?

HTH, Sherlog
--
Sherlog(at)fun.horx.de >>>>>>>>>>>>>>> J E H A D <<<<<<<<<<<<<<<<
[ Most Impatient ]
[ Vocal Programmer ] Joint Effort on Hacking Agent's Database
[ In This Group ] && enhancing functionality in various ways

Sherlog

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

On Sun, 27 Apr 1997 in <336335c9...@hillm.demon.co.uk>
Mark Hill typed thusly:

>Previously, Paul Hantom wrote:
>
>>.On Sat, 26 Apr 1997 22:29:41 GMT, pls.se...@my.sig posted <336381a9...@netnews.worldnet.att.net>:
>>

>>>A "References:" header isn't in the .dat file unless the body has been
>>>downloaded.
>>
>>Jim, Jim, Jim... then explain to me how Agent manages to thread headers
>>without bodies!?
>

>[Free] Agent does not thread on the References header; it threads on
>the information returned by the news server in response to the "XOVER"
>or "XHDR References" command.

And what do you think these commands return, and where the server gets
it from? Agent immediately hashes the info, though, so it does not show
up as plain text in the files.
--
Sherlog(at)fun.horx.de
[ Most Impatient ]
[ Vocal Programmer ]
[ In This Group ]

pls.se...@my.sig

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

;On Sun, 27 Apr 1997 11:17:51 GMT, ma...@hillm.demon.co.uk (Mark Hill) wrote:

>Previously, Paul Hantom wrote:

>>.On Sat, 26 Apr 1997 22:29:41 GMT, pls.se...@my.sig posted <336381a9...@netnews.worldnet.att.net>:

>>>A "References:" header isn't in the .dat file unless the body has been
>>>downloaded.

>>Jim, Jim, Jim... then explain to me how Agent manages to thread headers
>>without bodies!?

>[Free] Agent does not thread on the References header; it threads on
>the information returned by the news server in response to the "XOVER"
>or "XHDR References" command.

Actually it seems to thread based on a 32-bit hash of the message ids.

>If the news server returns bogus
>information, even if the headers contain the correct information,
>[Free] Agent threads incorrectly.
It also appears to thread incorrectly when two different message ids
generate the same hash value; I suspect that "subject" threading, when
enabled, may have the same failing.

>It is possible to
[>find the back references in Agent for a header-only article]
>compose a follow-up, save, move to a folder, examine the
>headers. This can't be done in Free Agent.
All but the folder part can. It is possible to examine the message in
the outbox using an editor (the "References" header is there), or post
the follow-up (?to a test group) and download it.

The "References" header generated by [Free] Agent when following up to
a message for which the body was not downloaded contains only one item
(the id of that message) no matter how many references were listed by
the original message. This appears to lead to a work-around (awkward
admittedly) for the "line # too long" problem (if it still occurs).

Note: I bracketed the "back references" line (above) because I don't
understand what you are describing. Could you elaborate?

Sherlog

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

On Sun, 27 Apr 1997 in <3363453f...@netnews.worldnet.att.net>
pls.se...@my.sig typed thusly:
...

>Actually it seems to thread based on a 32-bit hash of the message ids.
...and...

>It also appears to thread incorrectly when two different message ids
>generate the same hash value; I suspect that "subject" threading, when
>enabled, may have the same failing.

The hash uses only 26 bits (actual range -33554392..+33554392), which
is not very much when you consider that the Birthday Paradoxon is at
work here.

I do not know how many messages have to come to the party for the
likelihood of a collision to be 50%, but I expect only a couple ten
thousand or so. Maybe someone with a bit of statistical knowledge could
enlighten us... Paul, Paul, Paul?

BTW, as a wild guess I would say that 33554393 got used because it is
the largest prime that 2 MByte of RAM can buy - the largest number
representable in a 2 MB sieve is 33554433.

CRC32 would probably have been better as it has 6 bits more.
(How big a party for 32 bits?)

Mark Hill

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

Previously, Sherlog wrote:

>On Sun, 27 Apr 1997 in <336335c9...@hillm.demon.co.uk>
>Mark Hill typed thusly:
>

>>[Free] Agent does not thread on the References header; it threads on
>>the information returned by the news server in response to the "XOVER"
>>or "XHDR References" command.
>

>And what do you think these commands return, and where the server gets
>it from? Agent immediately hashes the info, though, so it does not show
>up as plain text in the files.

I know of at least one server which, under some circumstances, does
not return the correct (or complete) info to a "XHDR References"
command. [Free] Agent hashes and threads on this incorrect info, even
though the correct info is in the headers returned when Agent gets the
article.

Mark
--
To reply to this message, delete nothing from my address.
All UCE will be bounced.

pls.se...@my.sig

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

;On Sun, 27 Apr 1997 14:09:14 GMT, she...@fun.horX.de (Sherlog) wrote:

>The hash uses only 26 bits (actual range -33554392..+33554392), which
>is not very much when you consider that the Birthday Paradoxon is at
>work here.

>I do not know how many messages have to come to the party for the
>likelihood of a collision to be 50%, but I expect only a couple ten
>thousand or so. Maybe someone with a bit of statistical knowledge could
>enlighten us... Paul, Paul, Paul?

Assuming that hash values are equally distributed over the range, the
calculation is within the capability of your version number generator.
The assumption, however, is probably false. Besides, I would find a
failure rate exceeding 1/10^5 unacceptable, so unacceptable that I
would be re-evaluting GmM/r^2.

>CRC32 would probably have been better as it has 6 bits more.

My first guess, as you recall; I was delighted by your announcement of
the correct answer.

BTW: I have seen only two JEHAD posts, neither containing information
I know is already available. Is there a cumulative central repository
of bits and pieces?

Sherlog

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

On Sun, 27 Apr 1997 in <33656518...@netnews.worldnet.att.net>
pls.se...@my.sig typed thusly:

>;On Sun, 27 Apr 1997 14:09:14 GMT, she...@fun.horX.de (Sherlog) wrote:
>
>>The hash uses only 26 bits (actual range -33554392..+33554392), which
>>is not very much when you consider that the Birthday Paradoxon is at
>>work here.
>
>>I do not know how many messages have to come to the party for the
>>likelihood of a collision to be 50%, but I expect only a couple ten
>>thousand or so. Maybe someone with a bit of statistical knowledge could
>>enlighten us... Paul, Paul, Paul?
>Assuming that hash values are equally distributed over the range, the
>calculation is within the capability of your version number generator.

Using a rule of thumb, 2^(26/2) == 8K messages are sufficient to make a
collision likely (assuming equal distribution etc.) Utilizing all 32
bits (CRC32) would not require any code changes at all and raise this
to 64K, which - although still not safe - would be at least beyond the
'official' limit of 32K messages per group, and thus a problem only for
the most ardent archivers.

OTOH one must not forget that we're dealing with Usenet here, so that
this problem is unlikely _to be noticed_ in practice (or if so,
correctly attributed).

>The assumption, however, is probably false. Besides, I would find a
>failure rate exceeding 1/10^5 unacceptable, so unacceptable that I
>would be re-evaluting GmM/r^2.

Make that 1/10^4. Regarding your citing Newton's generalization of
Kepler's 3rd law (no, I do not say g-words in public), we certainly
have a Many Body Problem here which is known to be intractable. ;)

One reasonable precaution would be to keep message counts in important
groups low and do any archiving in folders. This way Agent would not
falsely reject any articles due to message id hash collision, and when
moving the messages to the archive folder Agent will at least warn you
about what it thinks is a duplicate (IIRC). If a collision does in fact
occur once in a while this can be remedied by changing the older
message's id slightly (and preserving the real id in an X-header).

This applies only if (in the course of your now certain re-evalutation)
you do not join the heretics, of course.

>>CRC32 would probably have been better as it has 6 bits more.
>My first guess, as you recall; I was delighted by your announcement of
>the correct answer.
>
>BTW: I have seen only two JEHAD posts, neither containing information
>I know is already available. Is there a cumulative central repository
>of bits and pieces?

Yes, if you regard DejaNews as such. :(

I realize the necessity of a central repository/focal point, but it is
unlikely that I will be able to do anything about it anytime soon (my
earlier optimism regarding T.A.P was somewhat premature).

If someone would put up something like the Tweak Agent Pages, I would
be happy to contribute; but for the time being I have resigned myself
to posting the odd tidbit now and then, when it fits in a discussion.

Luck++, Sherlog

Mark Hill

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

Previously, Paul Hantom wrote:

>.On Sat, 26 Apr 1997 22:29:41 GMT, pls.se...@my.sig posted <336381a9...@netnews.worldnet.att.net>:
>
>>A "References:" header isn't in the .dat file unless the body has been
>>downloaded.
>
>Jim, Jim, Jim... then explain to me how Agent manages to thread headers
>without bodies!?

[Free] Agent does not thread on the References header; it threads on


the information returned by the news server in response to the "XOVER"

or "XHDR References" command. If the news server returns bogus


information, even if the headers contain the correct information,
[Free] Agent threads incorrectly.

It is possible to find the back references in Agent for a header-only
article - compose a follow-up, save, move to a folder, examine the


headers. This can't be done in Free Agent.

Mark

Sherlog

unread,
May 4, 1997, 3:00:00 AM5/4/97
to

On Sun, 27 Apr 1997 20:04:44 GMT in <3365b0db...@J.E.H.A.D>
Sherlog typed thusly:

>On Sun, 27 Apr 1997 in <33656518...@netnews.worldnet.att.net>
>pls.se...@my.sig typed thusly:
>
>>;On Sun, 27 Apr 1997 14:09:14 GMT, she...@fun.horX.de (Sherlog) wrote:
>>
>>>The hash uses only 26 bits (actual range -33554392..+33554392), which
>>>is not very much when you consider that the Birthday Paradoxon is at
>>>work here.
>>

>>Assuming that hash values are equally distributed over the range, the
>>calculation is within the capability of your version number generator.
>
>Using a rule of thumb, 2^(26/2) == 8K messages are sufficient to make a
>collision likely (assuming equal distribution etc.) Utilizing all 32
>bits (CRC32) would not require any code changes at all and raise this
>to 64K, which - although still not safe - would be at least beyond the
>'official' limit of 32K messages per group,

I have found a way to improve the performance of Agent's hash function
by introducing two slight changes to the current code.

Currently it works similar to this:
:// hash function for Message-ID: and Subject: header contents
:long hash (const char *s) {
: if (!s || !*s) return 0; // empty string hashes to 0
: long v = 0;
: while (*s) { v = ((v<<7) + *s++) % 33554393L; }
: return v ? v : 1; // if v==0 return 1 instead
:}
The while loop needs to be changed like so:
: while (*s) { v = (_lrotl(v,7) + *s++) % 2147482949L; }
^^^^^^^^^^^ ^^^^^^^^^^^
i.e. the 'lossy' shift needs to be changed into a rotation (cannot use
a bigger prime otherwise) and a new prime must be chosen. The actual
differences to the machine code are minimal, only five bytes need to be
changed (per occurrence of the function in the .EXE) - four bytes for
the prime, and one bit in the mod/rm byte of the SHL instruction to
make it ROL.

My computer is currently looking at a couple primes to see which fares
best, in the mean time here are two 'prime' candidates:
: 2147482949 == 7FFFFD45 hex, shift 7 - 5th collision at 176256
: 2147482877 == 7FFFFCFD hex, shift 9 - 5th collision at 161171
Both passed the smaller tests (up to 57k messages) without collision.

For comparison: Agent's current function had 52 collisions for the 57k
test, with collisions starting to show up at about 5k messages.
(Reminder: the probability of collision increases non-linearly with the
number of messages.)

When the tests are completed I will post results plus detailed
instructions for patching (plus discussion of gotchas).

Ray Delio

unread,
Jun 18, 1997, 3:00:00 AM6/18/97
to

On Sun, 04 May 1997 20:59:56 GMT, she...@fun.horX.de (Sherlog) wrote:
This is WILD..

NOTE:


I have to reach back some 3(+) score years to match these references.
My comments try to prhase it in non technical language.

>>>;On Sun, 27 Apr 1997 14:09:14 GMT, she...@fun.horX.de (Sherlog) wrote:
>>>
>>>>The hash uses only 26 bits (actual range -33554392..+33554392), which
>>>>is not very much when you consider that the Birthday Paradoxon is at
>>>>work here.

1. The statistical "reality" Where by -
(Paradoxon must be a derivitive of Oxymoron or vice versa.)

For every person who joins the party, there's 1 less day available in the pool
of days where no match in Birthdays can occur.
Thus resulting in the un obvious fact that only 20 or so people are required
before it becomes likely that at leeast 2 will have the same Birthday.

< snip > Some provocative Mathematical (Statistical) Techonology.

Last wake up call follows.


| Most Impatient | *Joint Effort for Hacking Agent Database*
| Vocal Engineer | "In The Land of the Blind, a one eyed man is King!"
| You never met. | Or is he crazy? [Reality is a Paradox / Illusion]

Ray Delio, (aka secret-agent)

0 new messages